FedData packageIn one of our recent posts we discussed downloading Census data using the acs and tigris libraries in R. With a relatively new package, FedData, we can download additional government datasets within a user-defined area of interest. The following five sources of data are currently available for download with future ones in the works:
* The National Hydrography Dataset (USGS)
* The National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS)
* The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (NCSS)
* The Global Historical Climatology Network (GHCN), coordinated by National Climatic Data Center at NOAA
* The International Tree Ring Data Bank (ITRDB), coordinated by National Climatic Data Center at NOAA
For this particular post we’re focusing on the GHCN and NED datasets.
library(FedData) # for downloading Federal data
library(ggmap) # for choosing our area of interest
library(raster) # for defining extents and raster processing
library(sp) # for working with spatial objects
library(dplyr) # for awesomeness
library(leaflet) # for mapping
Before downloading any data we must first define our map extent or template. The template can be a raster or spatial object used to identify your area of interest. Based on the FedData documentation an undefined template indicates that all data for a particular dataset will be downloaded – which could be very time consuming! We recommend choosing a small area of interest to begin with :)
To define your extent and depending on your data needs you could read a shapefile into R using the rgdal library or choose a Census-designated boundary using the tigris or acs library. For this particular demonstration however we are going to be a little lazy and use the ggmap library to grab our extent. We’ll focus our data needs on our lovely Ithaca, NY!
ithaca<-get_map("Ithaca, NY", zoom=11)
ggmap(ithaca)
Using our ggmap object pull out the bounding box coordinates using the attr function.
bb<-attr(ithaca, "bb")
bb
## ll.lat ll.lon ur.lat ur.lon
## 1 42.28135 -76.72126 42.60564 -76.28181
The FedData package requires that our extent template be a bounding box matrix, not the current ggmap data.frame. The matrix should be a two-column matrix; the first column has the minimum, the second the maximum values; rows represent the spatial dimensions. We can easily create this by doing some simple re-arranging.
bb.mat<-bbox(matrix(bb[,c("ur.lon", "ll.lon", "ur.lat", "ll.lat")], nrow=2, ncol=2))
bb.mat
## min max
## x -76.72126 -76.28181
## y 42.28135 42.60564
Now we’ll create an extent object using the raster library. Keep in mind that the extent function can be used on both raster and spatial objects. Here we’ll create a ‘SpatialPolygon’ using the extent function.
bb.mat<-as(extent(bb.mat), "SpatialPolygons")
proj4string(bb.mat)<-"+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
Let’s check to make sure our bounding box is what we want.
# Perfect!
ggmap(ithaca) + geom_polygon(aes(x=long, y=lat), size=3, color='purple',
data=bb.mat, fill=NA)
SpatialPolygonsDataFrameIn our testing we found that the bounding box matrix was only working as our extent template if it was a ‘SpatialPolygonsDataFrame’ object, NOT a ‘SpatialPolygons’ object. Knowing this let’s convert our ‘bb.mat’ into a new ‘SpatialPolygonsDataFrame’.
# Get the ID slot of the 'SpatialPolygon' object
getSpPPolygonsIDSlots(bb.mat)
## [1] "1"
# You can also do this
sapply(slot(bb.mat, "polygons"), function(x) slot(x, "ID"))
## [1] "1"
# Assign the same ID to the 'SpatialPolygonsDataFrame'
bb.fin<-SpatialPolygonsDataFrame(bb.mat, data.frame(N = c("1"), row.names = c("1")))
# NOTE: We discovered this after writing the
# above lines. This function which is part of
# the `FedData` package will easily convert
# your SP to a SPDF. Much easier!
bb.fin<-spdf_from_polygon(bb.mat)
Now that we have our extent defined let’s download some data! Keep in mind that based on your extent this step might take a few minutes. Also you may want to set a working directory, otherwise the data will be downloaded to your default directory.
setwd("D:/work_big_current/blog/feddata_tompkins")
# Download NED elevation data. This will return a raster
# of elevation data cropped to our extent area. Res = "1"
# refers to 1 arc-second NED. This is the default. To
# download 1/3 arc-second NED use res = "13". I'm adding
# force.redo = F to both download functions. This will
# check to see if the data already exists in which case
# it will not re-download it.
ned_ithaca<-get_ned(template=bb.fin, label="ned_ithaca", res="1", force.redo = F)
# Download GHCN daily climate data. We're interested in
# precipitation (PRCP) but many other climate-related
# elements exist (i.e. max and min temp, snowfall, etc).
# See the `FedData` documentation for more details. Also
# of note is that a character vector of station IDs can
# be used in place of our 'bb.fin' extent object -- I
# did not test this.
ghcn_ithaca<-get_ghcn_daily(template=bb.fin, label="ghcn_ithaca",
elements="prcp", force.redo = F)
Using the base R plot function take a quick look at the data.
# Not pretty but otherwise so far so good!
plot(ned_ithaca)
plot(ghcn_ithaca$spatial, add=TRUE)
As you may have noticed in the above plot, we access the “spatial” component of the GHCN data for adding points to the map. The other component of the GHCN download is the “tabular” piece. This is where the actual climate data is stored. Here we’ll show the points again but this time using our Ithaca map and ggplot2.
# Plot the data using ggmap and ggplot2. Note
# that we convert our spatial coordinates to a
# data.frame for plotting with ggmap.
ghcn_pts<-data.frame(ID = ghcn_ithaca$spatial$ID,
lat = ghcn_ithaca$spatial@coords[,2],
long = ghcn_ithaca$spatial@coords[,1])
ghcn_pts<-mutate(ghcn_pts, ID = as.character(as.factor(ID)))
head(ghcn_pts)
## ID lat long
## 1 US1NYSY0001 42.3907 -76.7176
## 2 US1NYTM0002 42.5666 -76.5700
## 3 US1NYTM0003 42.4317 -76.4886
## 4 US1NYTM0004 42.5913 -76.3720
## 5 US1NYTM0005 42.5253 -76.3218
## 6 US1NYTM0006 42.3520 -76.3184
# Plot the points again. Looking a little better!
ggmap(ithaca) + geom_point(aes(x=long, y=lat),
size=3, color='red', data=ghcn_pts)
Now let’s have a look at the actual precipitation values. The GHCN output returns a large list where each list element is an individual station and each station contains a data.frame of precipitation values. If we had downloaded multiple climate elements there would be multiple tables associated with each station.
ghcn_dat<-ghcn_ithaca$tabular
# Pull out the PRCP tables. If there were other
# climate elements we could access these in a
# similar way (i.e. ghcn_dat, "[[", "TMIN")
prcp<-lapply(ghcn_dat, "[[", "PRCP")
# We're only interested in the most recent year of
# complete data. Let's filter to 2015.
prcp.2015<-lapply(prcp, function(x) filter(x, YEAR == 2015))
head(prcp.2015[[1]])
## YEAR MONTH D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17
## 1 2015 1 0 0 13 66 38 0 8 0 13 5 0 13 36 0 0 0 8
## 2 2015 2 0 102 38 5 71 0 0 8 NA 41 0 10 13 3 51 8 0
## 3 2015 3 10 15 0 64 3 0 0 8 0 0 23 0 0 5 0 13 8
## 4 2015 4 0 0 20 97 8 20 3 74 112 81 5 0 0 145 0 0 13
## 5 2015 5 0 0 0 0 0 18 0 0 0 0 43 188 64 3 0 124 66
## 6 2015 6 94 NA NA NA NA NA NA NA NA NA 0 0 91 0 678 3 76
## D18 D19 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 D30 D31
## 1 0 0 10 0 5 0 0 3 0 71 3 0 23 8
## 2 0 48 0 0 20 5 0 0 0 3 0 NA NA NA
## 3 8 0 0 23 15 8 0 0 0 150 8 0 0 18
## 4 0 0 36 147 0 20 NA NA NA NA NA 0 0 NA
## 5 0 244 3 0 0 0 0 0 0 0 0 0 0 33
## 6 0 3 0 51 69 0 46 0 0 0 399 61 10 NA
# Roughly half of our stations have no data for 2015.
# Remove these from the list.
prcp.2015<-prcp.2015[lapply(prcp.2015, nrow) > 0]
Summarize the data so we have annual averages for each station. Here we’ll calculate 1) monthly averages and 2) annual averages.
NOTE: we are being extremely lazy here in our calculations paying no regard to data completeness, etc. Please do not apply these methods blindly when doing your own analysis :)
# Create a vector of days for easily accessing day columns
days<-paste0("D", seq(1:31))
prcp.avg.month<-sapply(prcp.2015, function(x) rowMeans(x[,c(days)], na.rm=T))
# Remove outliers, incomplete stations
prcp.avg.month$US1NYTM0007<-NULL
prcp.avg.month$US1NYTM0027<-NULL
prcp.avg.month$US1NYTM0032<-NULL
# Take a quick look at the monthly data.
d<-reshape2::melt(prcp.avg.month)
names(d)<-c("prcp", "station")
head(d)
## prcp station
## 1 10.41935 US1NYSY0001
## 2 15.77778 US1NYSY0001
## 3 12.22581 US1NYSY0001
## 4 31.24000 US1NYSY0001
## 5 25.35484 US1NYSY0001
## 6 75.28571 US1NYSY0001
ggplot(d, aes(x = station, y = prcp)) +
geom_boxplot(fill="olivedrab3", color="olivedrab") +
theme(axis.title.x=element_blank(),
axis.text.x = element_text(angle = 90,
hjust = 1))
# Calculate annual averages and create a final
# data.frame of values
prcp.avg.ann<-sapply(prcp.avg.month, function(x) mean(x, na.rm=T))
prcp.fin<-data.frame(ID = names(prcp.avg.ann), prcpAvg = round(prcp.avg.ann, 2))
prcp.fin<-mutate(prcp.fin, ID = as.character(as.factor(ID)))
prcp.fin<-left_join(prcp.fin, ghcn_pts, by= "ID")
leaflet() %>% addProviderTiles("CartoDB.Positron") %>%
addCircleMarkers(data = prcp.fin, lng = ~long, lat = ~lat,
radius = ~prcpAvg/3, stroke=F,
popup = paste("<strong>Station ID</strong><br>", prcp.fin$ID,
"<br><br><strong>Average Annual<br>Precipitation</strong><br>",
prcp.fin$prcpAvg, " inches"), color = "#006B8C", fillOpacity = 0.7)
In a previous blog post we discuss working with rasters using the raster library. This post is helpful if you’re looking to crop, re-classify or re-sample your data. In addition we have a post on rendering a raster for use in Google Maps which highlights multiple methods (some successful, some not) of exporting and overlaying a raster in Google Maps. Given these posts we’re only going to lightly explore our elevation raster. The goal is to extract elevation values at the station locations and lastly create a final map!
# Take a quick look at the elevation distribution
neddat<-as.data.frame(ned_ithaca)
names(neddat)<-"elev"
ggplot(data=neddat, aes(elev)) +
geom_histogram(binwidth = 20, fill="sienna1", col="sienna3")
# Create SpatialPointsDataFrame from GHCN points
pts<-prcp.fin
coordinates(pts)<-~long+lat
proj4string(pts)<-"+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs"
# Project the points and elevation raster
# to NY State Plane
ithprj<-"+proj=tmerc +lat_0=40 +lon_0=-76.58333333333333 +k=0.9999375 +x_0=250000 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs"
ptsprj<-spTransform(pts, CRS(ithprj))
nedprj<-projectRaster(ned_ithaca, crs=ithprj)
# Extract elevation values at point locations
nedpts<-extract(nedprj, ptsprj)
# Add elevation values to the points data
prcp.fin<-cbind(prcp.fin, nedpts)
prcp.fin<-mutate(prcp.fin, nedpts = round(nedpts, 2))
head(prcp.fin)
## ID prcpAvg lat long nedpts
## 1 US1NYSY0001 26.58 42.3907 -76.7176 422.12
## 2 US1NYTM0002 27.39 42.5666 -76.5700 258.78
## 3 US1NYTM0004 29.81 42.5913 -76.3720 320.10
## 4 US1NYTM0005 30.10 42.5253 -76.3218 324.17
## 5 US1NYTM0006 24.96 42.3520 -76.3184 485.27
## 6 US1NYTM0015 24.42 42.3223 -76.5899 481.95
Here we’ll pull together all the pieces and create a relatively simple yet practical Leaflet map with our elevation raster and station locations. NOTE: depending on the size of your raster you may not be able to add the image to your Leaflet Map.
# Elevation colors were heavily borrowed from the
# Kansas Geological Survey's Elevation Map of Kansas:
# http://www.kgs.ku.edu/General/elevatMap.html
cols<-c("#06407F", "#317A9D", "#4ABEBB", "#40AE89", "#467B5D",
"#3C6D4D", "#1A572E", "#034C00", "#045D03", "#6C975F", "#6B823A",
"#88A237", "#C5D16B", "#DDE580", "#FFF6AE", "#FBCB81", "#F0B16A",
"#F2B16D", "#D18338", "#B16F33", "#825337", "#66422A", "#4F2C0C")
pal<-colorNumeric(cols, values(nedprj),
na.color = "transparent")
leaflet() %>% addProviderTiles("CartoDB.Positron") %>%
addRasterImage(nedprj, colors = pal, opacity = 0.6) %>%
addCircleMarkers(data = prcp.fin, lng = ~long,
lat = ~lat, radius = ~prcpAvg/3, stroke=F,
popup = paste("<strong>Station ID</strong><br>", prcp.fin$ID,
"<br><br><strong>Average Annual<br>Precipitation</strong><br>",
prcp.fin$prcpAvg, " inches<br><br><strong>Elevation</strong><br>",
prcp.fin$nedpts, " feet"), color = "#006B8C", fillOpacity = 0.7)